The primary goal of this analysis is to evaluate the ratings and customer reviews of AIB Bank branches in the Dublin area. I aim to utilize sentiment analysis techniques to compare and contrast the sentiments expressed in these reviews to investigate customer experiences at these branches.
To do this, I compare and contrast the sentiment and content of reviews from the highest-ranked branches with those from the lowest-ranked branches. By understanding the sentiment and content of reviews, AIB can identify areas for improvement, address customer concerns, and enhance the overall banking experience for its customers.
I collected data using Google Reviews and Google Maps, focusing on average customer ratings and customer reviews for all AIB banks in County Dublin. The dataset includes both the content of reviews and the associated ratings for each branch. Data was filtered to remove ATMs or any AIB building that does not interact with customers e.g Head-Office (see 'data_prep.ipynb' notebook for more information).
I ranked the AIB Bank branches based on their average ratings, and then identified the highest-ranked and lowest-ranked branches. Subsequently, I merged customer reviews from the top ranked branches and the lowest ranked branches (~120 reviews for each). I then performed sentiment analysis on the reviews from each of these branches.
Google Maps: Information on branch location and average customer rating.
Google Reviews: Reviews from customers (non-verified reviews).
In terms of overall sentiment polarity high-ranking branches displ a high percentage of positive expressions at 66% (see Figure 1).
In the reviews of high-ranked AIB branches, several frequently counted words emerge, the top 4 including "thanks," "like," "good," and "helpful" (see Figure 2 for Wordcloud, Figure 3 for barchart of terms).
These terms suggest that positive customer experiences, a sense of gratitude, and overall satisfaction significantly influence the feedback provided by customers.
The use of "thanks" and "helpful" particularly underscores the role of the staff in shaping a positive customer experience. Customers seem appreciative of the assistance and support they receive at these branches, which contributes to their overall positive perception.
Overall sentiment polarity for low-ranking branches display a lower relative percentage of positive expressions, and a higher amount of negative ones (see Figure 1).
For low-ranked branches, reviews contain frequently mentioned words such as "cash," "money," "help," and "lodge" (see Figure 2 for Wordcloud, Figure 3 for barchart of terms).
"Cash" and "lodge" may be associated with customer concerns about the transition toward cashless banking practices, where some AIB branches no longer offer certain services traditionally associated with cash transactions.
Additionally, the use of "help" may signify issues related to customer assistance and problem resolution, potentially affecting the customer experience and overall satisfaction at these branches.
This analysis has provided valuable insights into the sentiment and content of customer reviews for AIB Bank branches in the Dublin area. The results underscore the importance of factors such as service quality, customer support, and financial transactions in shaping customer perceptions and ratings.
Overall, improving these aspects in branch operations can lead to higher customer satisfaction and better ratings for AIB Bank branches.
# importing python libraries to use for analysis
import pandas as pd # panel data
import numpy as np # numerical python, for transformations
import matplotlib.pyplot as plt # matplotlib for plotting
import seaborn as sns # seaborn for plotting
import re # regex match patterns
from better_profanity import profanity
from textblob import TextBlob
from wordcloud import WordCloud # wordcloud for visualizing significant terms each review
from collections import Counter # counter for words count
import string
import plotly.express as px # interactive plots
from plotly.subplots import make_subplots
import plotly.graph_objects as go
Would like to take roughly equal amount of reviews from branches that have higher average reviews, and from branches with lower average reviews
# load in the df prepared in "data_prep" notebook - AIB branches in Dublin and summary of google reviews
df = pd.read_csv("../data/aib_banks_dublin_cleaned_reviews.csv")
df.head()
| name | location | number | google_rating | num_reviews | |
|---|---|---|---|---|---|
| 0 | AIB Bank | 100 101 Grafton Street | (01) 671 3011 | 2.9 | 109 |
| 1 | AIB Bank | 7/12 Dame St | (01) 679 3211 | 2.9 | 45 |
| 2 | AIB IFSC | Dublin 1 | (01) 829 1880 | 4 | 1 |
| 3 | AIB Bank | 61 Richmond St S | (01) 478 4533 | 1.9 | 49 |
| 4 | AIB Bank | 126 128 Capel St | (01) 872 1022 | 3.3 | 127 |
# remove rows with missing values, labelled as '-'
df = df[df['google_rating'] != '-']
# in this case will take reviews from the top ranked and lowest ranked branches without taking
# number of reviews in to account, if taking that in to account could use Bayesian average
df.sort_values('google_rating', inplace=True)
df.head()
| name | location | number | google_rating | num_reviews | |
|---|---|---|---|---|---|
| 3 | AIB Bank | 61 Richmond St S | (01) 478 4533 | 1.9 | 49 |
| 5 | AIB Bank | Cabra County Dublin | (01) 868 0071 | 2.2 | 56 |
| 29 | AIB Bank | County Dublin | (01) 490 4607 | 2.2 | 19 |
| 27 | AIB Bank | Blackrock County Dublin | (01) 288 0705 | 2.3 | 37 |
| 17 | AIB Bank | Dublin 11 | (01) 842 0285 | 2.3 | 15 |
Reviews from the highest ranked and lowest ranked branches were taken. This included 127 reviews for higher ranked, 115 for lower.
# first define a function that will prepare the reviews for analysis. Each review is a single line in the txt file
def clean_review(review):
# Check for missing values
if review is np.nan:
return ""
# Convert to lowercase
cleaned_review = review.lower()
# remove profanity
cleaned_review = profanity.censor(cleaned_review)
# Remove URLs
cleaned_review = re.sub(r'http\S+', '', cleaned_review)
# Remove special characters, punctuation, and non-alphanumeric characters
cleaned_review = re.sub(r'[^a-z0-9 ]', ' ', cleaned_review)
# Split the review into words
words = cleaned_review.split()
# Define a list of stopwords to remove
stopwords = ["for", "on", "an", "a", "of", "and", "in", "the", "to", "from"]
# Remove stopwords
words = [word for word in words if word not in stopwords]
# Join the cleaned words back into a string
cleaned_review = " ".join(words)
return cleaned_review
# Read reviews from the text files into a list
def read_reviews_from_file(file_path):
with open(file_path, 'r', encoding='utf-8') as file:
reviews = [line.strip() for line in file]
return reviews
# Clean the reviews and store them in a new list
def clean_reviews(reviews):
cleaned_reviews = [clean_review(review) for review in reviews]
return cleaned_reviews
# load in our higher and lower ranked reviews
file_path_higher = '../data/higher_scores/merged_higher.txt'
file_path_lower = "../data/lower_scores/merged_lower.txt"
reviews_higher = read_reviews_from_file(file_path_higher)
reviews_lower = read_reviews_from_file(file_path_lower)
cleaned_reviews_higher = clean_reviews(reviews_higher)
cleaned_reviews_lower = clean_reviews(reviews_lower)
# have a look at some sample lines from the cleaned reviews
print(cleaned_reviews_higher[0:2], cleaned_reviews_lower[0:2])
['all good', 'we ve been served by morgan she was so helpful friendly it literally took 5 mins complete everything best aib experience'] ['most pointless bank around go make lodgement you re told go across post office they can t help you with any day day banking needs avoid', 'very good bank they do help you out']
Can use the package TextBlob which will give us polarity values for each review.
# Define the sentiment objects using TextBlob
sentiment_objects_higher = [TextBlob(review) for review in cleaned_reviews_higher]
sentiment_objects_lower = [TextBlob(review) for review in cleaned_reviews_lower]
# Create a list of polarity values and review text
sentiment_values_higher = [[review.sentiment.polarity, str(review)] for review in sentiment_objects_higher]
sentiment_values_lower = [[review.sentiment.polarity, str(review)] for review in sentiment_objects_lower]
# Create a dataframe of each review against its polarity
sentiment_df_higher = pd.DataFrame(sentiment_values_higher, columns=["polarity", "review"])
sentiment_df_lower = pd.DataFrame(sentiment_values_lower, columns=["polarity", "review"])
sentiment_df_higher.head() # look of the sentiment dataframe
| polarity | review | |
|---|---|---|
| 0 | 0.700000 | all good |
| 1 | 0.491667 | we ve been served by morgan she was so helpful... |
| 2 | 0.386905 | had pleasure dining buddha mama last saturday ... |
| 3 | -0.016667 | tried lodge cheque machine froze 5 minutes eve... |
| 4 | 0.200000 | maria victor grateful staf thanks your help yo... |
# Save the polarity column as 'n'.
n_high = sentiment_df_higher["polarity"]
n_low = sentiment_df_lower["polarity"]
# Convert this column into a series, 'm'.
m_high = pd.Series(n_high)
m_low = pd.Series(n_low)
# initialize three variables, ‘pos’, ‘neg’ and ‘neu’. Use these variables to create a loop to classify
# the tweets as positive, negative, and neutral.
pos_h=0
neg_h=0
neu_h=0
pos_l=0
neg_l=0
neu_l=0
# Create a loop to classify the tweets as Positive, Negative, or Neutral.
# Count the number of each.
for items in m_high:
if items>0:
pos_h=pos_h+1
elif items<0:
neg_h=neg_h+1
else:
neu_h=neu_h+1
for items in m_low:
if items>0:
pos_l=pos_l+1
elif items<0:
neg_l=neg_l+1
else:
neu_l=neu_l+1
# Data for high ranking
labels_high = ["Positive", "Negative", "Neutral"]
sizes_high = [pos_h, neg_h, neu_h]
# Data for low ranking
labels_low = ["Positive", "Negative", "Neutral"]
sizes_low = [pos_l, neg_l, neu_l]
# Create subplot
fig = make_subplots(1, 2, specs=[[{'type':'pie'}, {'type':'pie'}]])
# Create chart for high ranking branches
fig.add_trace(go.Pie(labels=labels_high, values=sizes_high, title="High Ranking Branches"), row=1, col=1)
# Create chart for low ranking branches
fig.add_trace(go.Pie(labels=labels_low, values=sizes_low, title="Low Ranking Branches"), row=1, col=2)
# Update layout
fig.update_layout(title_text="Sentiment Polarity for Reviews from High and Low Ranking Branches",
title_x = 0.5, annotations=[dict(x=0.5, y=-0.1, text="Figure 1. Sentiment polarity for high ranking branches vs lower ranking branches.",
showarrow=False, xref="paper",yref="paper", font=dict(size=16),),])
# Display
fig.show()
A wordcloud can be used to represent the most popular words in the reviews. The words are first filtered to remove any common stop words and domain specific words like 'bank' in this case
# Create a Wordcloud from the reviews
#remove some common words including domain specific ones which will be there like 'bank' etc
common_stopwords = [
"i", "me", "my", "myself", "we", "our", "ours", "ourselves",
"you", "your", "yours", "yourself", "yourselves",
"he", "him", "his", "himself", "she", "her", "hers", "herself",
"it", "its", "itself", "they", "them", "their", "theirs", "themselves",
"what", "which", "who", "whom", "this", "that", "these", "those",
"am", "is", "are", "was", "were", "be", "been", "being", "have", "has", "had",
"do", "does", "did", "doing", "a", "an", "the", "and", "but", "if", "or",
"because", "as", "until", "while", "of", "at", "by", "for", "with", "about",
"against", "between", "into", "through", "during", "before", "after", "above",
"below", "to", "from", "up", "down", "in", "out", "on", "off", "over", "under",
"again", "further", "then", "once", "here", "there", "when", "where", "why",
"how", "all", "any", "both", "each", "few", "more", "most", "other", "some", "such",
"no", "nor", "not", "only", "own", "same", "so", "than", "too", "very", "s", "t",
"can", "will", "just", "don", "should", "now", "d", "ll", "m", "o", "re", "ve", "y",
"ain", "aren", "couldn", "didn", "doesn", "hadn", "hasn", "haven", "isn", "ma", "mightn",
"mustn", "needn", "shan", "shouldn", "wasn", "weren", "won", "wouldn", "bank","account",
"branch","service",'go',"customer","get","banking","aib","use","time","went","need","needed","would",
"told", "staff", "one"
]
# Create a Wordcloud from the high-ranking reviews
all_words_high = ' '.join([text for text in cleaned_reviews_higher])
wordcloud_high = WordCloud(width=480, height=500, random_state=21,max_font_size=110,colormap='viridis', background_color='white', stopwords=common_stopwords).generate(all_words_high)
# Create a Wordcloud from the low-ranking reviews
all_words_low = ' '.join([text for text in cleaned_reviews_lower])
wordcloud_low = WordCloud(width=480, height=500, random_state=21, max_font_size=110, colormap='viridis', background_color='white',stopwords=common_stopwords).generate(all_words_low)
# Create a subplot layout
fig, axes = plt.subplots(1, 2, figsize=(12, 6))
# Display the high-ranking word cloud in the first subplot
axes[0].imshow(wordcloud_high, interpolation="bilinear")
axes[0].set_title("High Ranking Reviews")
axes[0].axis('off')
# Display low-ranking word cloud in the second subplot
axes[1].imshow(wordcloud_low, interpolation="bilinear")
axes[1].set_title("Low Ranking Reviews")
axes[1].axis('off')
# title and caption
plt.suptitle("Wordcloud for Reviews from High/Low Ranking Branches", fontsize=16, y = 1.02)
fig.text(0.5, -0.1, 'Figure 2. Wordcloud for high ranking branches vs lower ranking branches.', fontsize=15, ha='center')
# Show plot
plt.tight_layout()
plt.show()
# take all the words from the reviews and split them using spaces
high_words = all_words_high.split(' ')
low_words = all_words_low.split(' ')
# use the same stopword filter that was used for the wordcloud above
high_without_stopwords = [ word for word in high_words if word not in common_stopwords ]
low_without_stopwords = [ word for word in low_words if word not in common_stopwords ]
# Get the top 10 most common words in each set of reviews
top_words_high = Counter(high_without_stopwords).most_common(10)
top_words_low = Counter(low_without_stopwords).most_common(10)
# Unpack tuples to separate words and frequencies
words_high, frequencies_high = zip(*top_words_high)
words_low, frequencies_low = zip(*top_words_low)
# Create a subplot with two horizontal bar charts side by side
fig = fig = make_subplots(rows=1, cols=2, column_widths=[0.5, 0.5])
# Add bar chart for highest-ranked review words
fig.add_trace(go.Bar(y=words_high, x=frequencies_high, orientation='h', marker_color='limegreen', name="High Ranked"), row=1, col=1)
# Add bar chart for lowest-ranked review words
fig.add_trace(go.Bar(y=words_low, x=frequencies_low, orientation='h', marker_color='tomato', name="Low Ranked"), row=1, col=2)
# Update layout
fig.update_layout(
title_text="Comparison of Top 10 Words in High and Low Ranked Reviews",
title_x=0.5,
annotations=[
# Add a caption
dict(
x=0.5, # adjust position
y=-0.225,
text="Figure 3. Top 10 words in high and low ranked reviews.",
showarrow=False,
xref="paper",
yref="paper",
font=dict(size=16),
),
], width = 900
)
# Update x-axis titles
fig.update_xaxes(title_text="Frequency (High Ranked)", row=1, col=1)
fig.update_xaxes(title_text="Frequency (Low Ranked)", row=1, col=2)
# Display
fig.show()